Master equation approach to protein folding 
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The dynamics of two 12-monomer heteropolymers on the square lattice is studied exactly within 
the master equation approach. The time evolution of the occupancy of the native state is deter- 
mined. At low temperatures, the median folding time follows the Arrhenius law and is governed 
by the longest relaxation time. For both good and bad folders, significant kinetic traps appear in 
the folding funnel and the kinetics of the two kinds of folders are quite similar. What distinguishes 
between the good and bad folders are the differences in their thermodynamic stabilities. 
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A denatured protein folds into a compact native 
state in a time of order a millisecond after the phys- 
iological conditions are restored Q. This process is 
reversible and the native state is believed to coincide 
with the ground state of the system. Studies of sim- 
plified lattice models of proteins have provided many 
insights into the dynamics of the folding process [Q . 
The crucial feature that makes lattice models use- 
ful is the possibility of enumerating the complete set 
of conformations and determining which of them is 
the ground state. This can be accomplished, if the 
length of the heteropolymer, N, is small. Such stud- 
ies have elucidated the role of thermodynamic stabil- 
ity ^J,[| , stability against mutations Q and existence 
of a linkage between rapid folding and the stability 
of the native state || . 

The key concept that has been introduced to ex- 
plain the rapid folding that occurs in natural pro- 
teins is that of the folding funnel J5|,^| - a set of 
conformations that are smoothly connected to the 
native state, as indicated schematically at the top of 
Figure 1. It is expected that for random sequences 
of aminoacids there are competing basins of attrac- 
tion which would trap the system away from the na- 
tive state. This corresponds schematically to what 
is shown at the bottom of Figure 1. The identi- 
fication of states that belong to the funnel is not 
easy due to an enormous number of conformations 
that are present even for a small N and because 
the problem is dynamical one. Possible approaches 
include the monitoring of frequences of passages be- 
tween various states in Monte Carlo trajectories || 
or the mapping _of_ states into underlying valleys of 
effective states 





FIG. 1. A schematic representation of the phase 
space properties of good (top) and bad (bottom) fold- 
ers. The vertical axis corresponds to energy and the 
horizontal axis to a "coordinate" in the phase space. 

Essentially all approaches to the studies of the 
folding dynamics have been restricted to Monte 
Carlo simulations that start from a few randomly 
chosen initial conformations || . The only exception 
has been an approach due to Chan and Dill |lCj in 
which an enumeration of transition rates between 
classes of conformations which have the same num- 
ber of contacts and are a given number of kinetic 
steps away from the native state. 

Recently, we presented an exact method to study 
the dynamics of short model proteins [O which was 
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based on the master equation jT3| . We have specifi- 
cally considered two N—12 sequences, called A and 
B, which are placed on a square lattice. The dy- 
namics of these sequences can be studied exactly 
because the sequences can acquire only Af= 15037 
conformations. The two sequences have the same 
set of contact energies Bij but their assignment to 
various monomers i and j is different with the re- 
sult that A is a good folder and B is a bad folder. 
The basic finding of |t0| is that the qualitative pic- 
ture corresponding to Figure 1 is indeed correct. By 
identifying kinetic trap states that are responsible 
for the slowest dynamical processes in the system 
we could demonstrate that the traps for sequence A 
are within the folding funnel whereas for sequence B 
the relevant traps form a valley which competes with 
the native valley in analogy to the bottom of Figure 
1. In the current paper we provide a deeper charac- 
terization and comparisons of the two sequences and 
elucidate the nature of the trap states. 
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U = 2J BijAij 



(1) 



where Ay indicates that the contact interaction 
is assigned to monomers which arc geometrical near- 
est neighbors on the lattice but are not neighbors 
along the sequence. In such arrangements Ay is 
equal to 1, otherwise it is 0. The values of the 25 
couplings are chosen as Gaussian numbers which are 
centered around -1 to provide an overall attraction, 
as detailed in jl0| . The ground state of sequence A 
is maximally compact and it fills the 3x4 lattice, as 
shown at the top of Figure 2. For sequence B, the 
ground state, shown at the top of Figure 3, is doubly 
degenerate. Both of the ground states are compact 
but not maximally compact. They differ merely by 
a placement of one end monomer and therefore they 
were considered as an effective single native state. 




FIG. 2. Top: The native conformation and its en- 
ergy for sequence A. The enlarged circle shows the first 
monomer. Main: Dynamical data for the folding. The 
solid line marked by tfoid gives the median folding time 
derived from 1000 Monte Carlo trajectories. The solid 
line ti is the longest relaxation time. The broken line ti 
with the black circles gives the time for Po(t) to reach 
^Po from the uniformly random initial state. The dotted 
line tA is a fit of the Monte Carlo data to the Arrhenius 
law with 8E=2.76. The arrow at the top indicates the 
value of the folding transition temperature. 

The sequences that we study are described by the 



FIG. 3. Same as Figure 2, but for sequence B. For the 
curve tA, 6E=3.55. There are two native conformations 
with the same energy. 

We have found that the dynamics of A and B are 
superficially similar: for both, the median folding 
time, tfoidi and the longest relaxation time, n di- 
verge at low T according to an Arrhenius law. The 
temperature T min at which folding to the native state 
proceeds the fastest is about the same for both se- 
quences. It is the location of the folding transition 
temperature, Tf, with respect to T m ; n which distin- 
guishes between sequences A and B. Tf is defined 
as the temperature at which the equilibrium value 
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of the probability to occupy the native state, Pq, 
crosses i and is a measure of thermodynamic stabil- 
ity. For bad folders, Tf is well below T mm and thus 
a substantial occupation probability for the native 
state is found only in a temperature range in which 
the dynamics are glassy. For sequence A, the values 
of Tf and T m ; n are 0.71 and 1.0 ± 0.1, respectively, 
while for sequence B, the corresponding values are 
0.01 and 1.1 ± 0.1. Thus the two sequences are dy- 
namically similar but the equilibrium properties dif- 
fer dramatically. 

Methods: We study our sequences through an 
analysis of the master equation and then compare 
the results to those obtained by the Monte Carlo 
approach. The master equation does not deal with 
a specific trajectory but with an ensemble of trajec- 
tories and it reads 

BP 

= £ [w(/3 -» a)P p - w(a - l3)P a ] (2) 

where P a = P a (t) is the probability of finding the 
sequence in conformation a at time t. The quantity 
w a p = w{/3 — ► a) is the transition rate from con- 
formation (3 to conformation a. Of course, writing 
a master equation relies on the assumption that a 
Markov chain description adequately describes pro- 
tein kinetics, in particular, memory effects are as- 
sumed to be negligible. The actual derivation of a 
master equation remains a formidable problem in it- 
selfQ. 

One may bring this into a matrix form by letting 
P={P 1 ,...,P N ) and 

m a p = -w a p < if a ^ , m aa = ^ w Pa ■ 

(3) 

The master equation then takes the form of an 
imaginary-time Schrodinger equation [ |D3||l4j ] 

d t P = —MP , (4) 

where the m a p are the matrix elements of M, 

The time-dependence state vector P{t) at time 
t = utq can be obtained by applying n times the 
recursion P((n + 1)t ) = (1 — r M)P(nT ) to Pi n , 
where To is a microscopic time associated with a sin- 
gle move and Pj n is some initial probability distribu- 
tion. On the other hand, the spectrum of relaxation 
times t\ — l/(Re M\) > follows directly from the 
eigenvalues M\ of M. 



The transition rates uu a /3 are designed so that a 
stationary solution of the master equation corre- 
sponds to equilibrium with the Hamiltonian given 
by eq. (@). This can be accomplished provided the 
detailed balance condition 

w aP Pp = Wf, a P? (5) 

is satisfied. Here, P° q ~ er Ea l T is a stationary 
solution of the master equation and E a is the en- 
ergy of the sequence in conformation a. Eq. (^) 
is satisfied by any w a p = / Q /3exp [—{E a — Ep)/2T] 
provided only that f a p = fp a . Here, we choose 

w a p = w„p + , where 

w a0 = —R* COsh K E ° ~ E P) IT] (6) 
TO 

with R\ + i?2 = 1. Here, a — 1 and 2 refer to the 
single- and double-monomer moves respectively. It is 
understood that — if there is no move of type 
a linking (3 with a. The choice (^) guarantees that 
transition rates are finite and bounded for all tem- 
peratures. In analogy to ||, we focus on Ri = 0.2 
and take the single and two-monomer (crankshaft) 
moves as in ]Tc| . 

Because of the detailed balance condition, the 
eigenvalues E a are not calculated by diagonalizing 
M directly, but by diagonalizing an auxiliary matrix 
K with elements 

k a fj = —m a p = kpa , (7) 
v a 

where v a = e ~ E ^/ 2T . The master equation can be 
then rewritten as 

d t Q = -KQ , (8) 

where K is real symmetric and Q a = P a /v a . This 
equation is then diagonalized using the standard 
symmetric Lanczos algorithm without reorthogonal- 
ization, as described in jl5|,[l6| and then the eigen- 
modes of M arc determined. In particular, since K 
and M are related by a similarity transformation, 
all eigenmodes M\ are real. 

The eigenvector corresponding to M\ — 0, i.e. to 
the infinite relaxation time, determines the equilib- 
rium occupancies of the conformations. This eigen- 
vector needs to be normalized to satisfy its proba- 
bilistic interpretation. The longest finite relaxation 
time Ti = 1 /M% is found from the smallest non-zero 
eigenvalue M\. 

Our Monte Carlo simulations have been done in 
a way that satisfies the detailed balance conditions 
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and were devised along the lines described in JTo| . 
For polymers, satisfying these conditions is non- 
trivial because each conformation has its own num- 
ber, K, of allowed moves that the conformation can 
make. Thus the propensities to make a move in a 
unit time vary from conformation to conformation 
and the effective "activities" of the conformations 
need to be matched. This can be accomplished by 
first determining the maximum value of K, K max . 
For the 12-monomer chain on the square lattice, 
K max is equal to 14. We associate a single time unit 



with the conformations in which K=K n 



This 



means that each allowed move is being attempted al- 
ways with probability 1/K max . For a conformation 
with K allowed moves, the probability to attempt 
any move is then K/K max and the probability not 
to carry out any attempt is 1 — K/K max . The 
attempted moves are then accepted or rejected as 
in the standard Metropolis procedure. The proba- 
bility of a single monomer move (an end flip or a 
switch in a pair of bonds that make the 90° angle) is 
additionally reduced by the factor of 0.2 and an al- 
lowed double monomer crankshaft move by 0.8. The 
time is then measured in terms of the total num- 
ber of the Monte Carlo attempts divided by i^ ma x- 
This scheme not only establishes the detailed bal- 
ance conditions but it also uses less CPU compared 
to a process in which moves are attempted without 
regard to whether they are allowed or not. 

Results: Figure 4 shows the 4 longest finite re- 
laxation times for sequence A as a function of tem- 
perature. It is seen that they are all roughly pro- 
portional to each other and they all appear to di- 
verge at T=0, albeit with differing energy barriers. 
The longest relaxation time follows the Arrhenius 
law, n ~ e SEl / T with 5E X of about 4.1. The 
Monte Carlo derived median folding time also fol- 
lows the Arrhenius law at low temperatures but with 
5E = 2.76 ± 0.06. The overall Arrhenius-like behav- 
ior suggests that the physics of folding at low T's is 
dominated by processes that establish equilibrium. 
Their longest time scale is then set by t\ . At temper- 
atures above T m ; n , tf \d no longer follows t\. Here, 
the physics of folding is dominated by the statistics 
of rare events: the system establishes equilibrium 
rapidly and then folding is reached by a search in 
equilibrium. The search becomes more and more 
random when T becomes larger and larger. For se- 
quence B, similar results are obtained but the Ar- 
rhenius barrier in tfoid is 3.55 ±0.06, whereas 5E\ = 
4.0±0.06. Figures 2 and 3 show fits to the Arrhenius 
law for sequences A and B respectively. 
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FIG. 4. The four longest relaxation times versus T 
for sequence A. The very longest, t\, is drawn by a solid 
line whereas the remaining three by the dotted lines. The 
value of Tf is indicated by an arrow and the Monte Carlo 
data on tfdd are repeated from Figure 2 for a comparison. 

We now focus on the time evolution of the prob- 
ability, Po(t), to occupy the native state, where t 
denotes time. This probability depends on the ini- 
tial conditions. The solid lines in Figure 5 show 
the evolution of Po(t) when in the initial state all 
conformations have equal probability of 1/Af to be 
occupied. The main figure is for sequence A and the 
time evolution is shown for three temperatures: 1.2, 
0.9 and 0.6. The data for the lowest of these tem- 
peratures are also compared with the Monte Carlo 
evolution. The Monte Carlo data, obtained by start- 
ing from random conformation, agree with the exact 
evolution. The time scales at which the equilibrium 
saturation values of Pq are reached are of order n . 
The top portion of Figure 5 shows Po(t) for sequence 
B. It is seen that the saturation levels are signifi- 
cantly lower for B than for A at the same tempera- 
tures. This reflects a significantly lower value of Tf 
for B. We also observe, by looking at Figures 2 and 
3 that the median folding time at low temperatures 
appears to coincide with the time needed for Po(t) 
to reach half of its equilibrium value. 
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FIG. 5. Probability of occupation of the native state, 
Po(t), of sequence A, for three values of the tempera- 
tures, indicated on the right. Po(oo) agrees with the 
equilibrium value. The solid lines correspond to the uni- 
formly random initial condition and the dotted lines cor- 
respond to an initial placement of the system in the trap 
state. The squares correspond to Monte Carlo results, 
based on 200 random starting conformations. The top 
figure shows Po(t) for sequence B. The initial condition 
is that of uniform randomness. The y-axis scale is indi- 
cated on the right. 



ceding the native state kinctically, may also posses 
a significant occupancy but it does not constitute a 
trap. Thus the search for the most relevant trap goes 
after non-native local energy minima which have the 
biggest occupancy. In the limit of T — > 0, weights 
associated with all other local energy minima states 
become small. 

We now focus on the interpretation of the trap 
states to demonstrate the qualitative validity of Fig- 
ure 1. We place the system in the trap state and 
ask what is the best trajectory which leads from the 
trap state to the native state. The best trajectory is 
defined to be one in which the highest energy state 
is lower than the highest energy state on all other 
trajectories. 
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The dotted line in Figure 5 corresponds to the ini- 
tial state being the kinetic "trap" conformation 
which is the strongest obstacle in reaching equilib- 
rium. If the system starts in the trap state, it will 
still reach the final equilibrium in a time scale set by 
Ti but at shorter times the occupancy of the native 
state is significantly less than when starting from a 
totally random state. 

In general, it is not easy to determine which con- 
formation forms the most potent trap. One may 
attempt to determine it by monitoring occupation 
of the local energy minima in a Monte Carlo pro- 
cess performed at very low temperatures. This ap- 
proach has been successfully applied in ||. Here, the 
trap state is determined in an exact way by studying 
the eigenvector corresponding to the longest relax- 
ation time and by identifying the local energy min- 
ima which contribute the most to this eigenvector at 
low T. The largest weight is associated with the na- 
tive state. Usually, the second largest corresponds to 
the most relevant trap. This happens provided the 
second largest occupancy is in a local energy mini- 
mum state. A non-minimum state, immediately pre- 



MAX AE = 2.8823 
MAX ELEV. = 4.5323 

FIG. 6. The best trajectory linking the trap state of 
sequence A (shown at the top left) and the native state. 
The energies involved are shown below each conforma- 
tion. The biggest single step and global energy expenses 
are indicated at the bottom. 

Figure 6 shows the trap state (of energy -8.9627) 
for sequence A and the best corresponding trajectory 
that connects it to the native state. The trajectory 
reaches the native state in 8 steps and it elevates 
in energy by 4.5323, relative to the the energy of 
the trap state, after accomplishing the second step. 
Thus it takes only two uphil steps to enable a flow to 
the native state. The first of these steps breaks two 
contacts and it costs 2.8823 in energy. Thus this 
biggest single step energy cost is what determines 
the value of the energy barrier in the Arrhenius law 
for tfoid, as derived through the Monte Carlo tech- 
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nique. On the other hand, 5E\, that determines n, 
appears to be governed by the energy costs to make 
the initial two-step climb to the point of highest el- 
evation. In a Monte Carlo process, two consecutive 
climbs up are not very likely at low T's and can be 
missed. However, the exactly derived longest relax- 
ation time indicates the true nature of the barrier. 

The kinetics of sequence B is quite similar to that 
of sequence A. Thus what distinguishes between the 
two sequences are their thermodynamic stabilities as 
measured by Tf. Figure 7 shows the steps of the best 
trajectory from the the most important trap state for 
sequence B (of energy -8.4431) to the native state. 
The trap state is almost identical in shape to the 
two native conformations: it is only the positioning 
of the middle portion of the conformation that is not 
quite right. And yet it takes 10 steps, like in Figure 
6, to accomplish the transition between the trap and 
native states. The overall barrier that is climbed is 
equal to 4.4 which is close to the Arrhenius barrier 
5E\ in n of 4.0. The biggest single step energy cost 
of 2.7478 on the trajectory is, however, further away 
from 5E in tfoid of about 3.6. The important obser- 
vation is that for the bad folder B the most relevant 
trap conformation is still in the folding funnel of the 
native state. The presence of other low lying val- 
leys that compete with the native valley does not 
really affect the trapping effects in the native fun- 
nel. Instead it generates low stability of the native 
state through a substantial equlibrium probability 
of being in some other valleys. 
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FIG. 7. As in Figure 6 but for sequence B. 

Even though our exact results have been obtained 
in a toy model of only 12 monomers, it is expected 
that the essential physics of protein folding and the 
distinction between the good and bad folders are as 
illustrated here. Tools to study larger and off-lattice 
systems in this spirit remain to be developed. 
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